Skip to content

feat(relay): steer deployments with rolling quota ledger#1367

Closed
maybeknott wants to merge 1 commit into
therealaleph:mainfrom
maybeknott:fix/rolling-quota-ledger
Closed

feat(relay): steer deployments with rolling quota ledger#1367
maybeknott wants to merge 1 commit into
therealaleph:mainfrom
maybeknott:fix/rolling-quota-ledger

Conversation

@maybeknott
Copy link
Copy Markdown

Apps Script quota is consumed per relay invocation, but a plain round-robin selector has no memory of how heavily this client has used each deployment inside the recent quota window. When multiple script IDs are configured, continuing to select an already saturated deployment while another configured deployment is still locally underused wastes available capacity and increases the chance of quota-related relay stalls.

DomainFronter now keeps a per-script local ledger of selection timestamps in a rolling 24-hour window. Before choosing a script ID, the selector prunes expired observations and prefers non-blacklisted deployments whose local call count remains below the free-tier request budget. Both the single-request selector and the parallel fan-out selector use the same ledger so Apps Script batches and relay fan-out draw from the same local capacity model.

The ledger records selections at dispatch time. That deliberately accounts for concurrent fan-out attempts and for requests that may still complete server-side after the Rust future is dropped. The ledger is a local steering signal rather than an authoritative Google quota reading: if every non-blacklisted deployment is locally saturated, the selector still returns a deployment instead of creating a client-side outage. This preserves connectivity for paid Workspace quotas, shared deployments whose external usage is invisible to this process, and cases where the local estimate is conservative.

Selection remains decoupled from the existing failure blacklist. Blacklisted deployments are still skipped first; the rolling quota ledger only orders otherwise healthy deployments by locally observed capacity. If all deployments are blacklisted, the existing earliest-cooldown recovery path is preserved and the selected deployment is recorded in the ledger.

The guide now describes the local rolling 24-hour ledger in the Full Mode deployment-scaling section, including the fact that it steers away from deployments this client has already driven near the free-tier request budget. Unit coverage exercises saturated deployment skipping, expired observation pruning, all-saturated connectivity fallback, and parallel selection preferring unsaturated deployments.

Apps Script quota is consumed per relay invocation, but a plain round-robin selector has no memory of how heavily this client has used each deployment inside the recent quota window. When multiple script IDs are configured, continuing to select an already saturated deployment while another configured deployment is still locally underused wastes available capacity and increases the chance of quota-related relay stalls.

DomainFronter now keeps a per-script local ledger of selection timestamps in a rolling 24-hour window. Before choosing a script ID, the selector prunes expired observations and prefers non-blacklisted deployments whose local call count remains below the free-tier request budget. Both the single-request selector and the parallel fan-out selector use the same ledger so Apps Script batches and relay fan-out draw from the same local capacity model.

The ledger records selections at dispatch time. That deliberately accounts for concurrent fan-out attempts and for requests that may still complete server-side after the Rust future is dropped. The ledger is a local steering signal rather than an authoritative Google quota reading: if every non-blacklisted deployment is locally saturated, the selector still returns a deployment instead of creating a client-side outage. This preserves connectivity for paid Workspace quotas, shared deployments whose external usage is invisible to this process, and cases where the local estimate is conservative.

Selection remains decoupled from the existing failure blacklist. Blacklisted deployments are still skipped first; the rolling quota ledger only orders otherwise healthy deployments by locally observed capacity. If all deployments are blacklisted, the existing earliest-cooldown recovery path is preserved and the selected deployment is recorded in the ledger.

The guide now describes the local rolling 24-hour ledger in the Full Mode deployment-scaling section, including the fact that it steers away from deployments this client has already driven near the free-tier request budget. Unit coverage exercises saturated deployment skipping, expired observation pruning, all-saturated connectivity fallback, and parallel selection preferring unsaturated deployments.
@github-actions github-actions Bot added the type: feature feat: PR — auto-applied by release-drafter label May 23, 2026
@CaptainMirage
Copy link
Copy Markdown
Contributor

man im working on an overhaul for the quota system right now, the 24 hour thing is in it too, along with some UI fixes to then update the UI itself

@maybeknott
Copy link
Copy Markdown
Author

man im working on an overhaul for the quota system right now, the 24 hour thing is in it too, along with some UI fixes to then update the UI itself

If you agree and share whats in your roadmap. from now on, I can do a rebase to your fork and push my changes on yours so can avoid overlap on those matters.

P.S: By roadmap, I mean what do you have in mind other than whats currently on your fork! I already visited yours and crossed what you done from my roadmap.

@CaptainMirage
Copy link
Copy Markdown
Contributor

CaptainMirage commented May 24, 2026

man im working on an overhaul for the quota system right now, the 24 hour thing is in it too, along with some UI fixes to then update the UI itself

If you agree and share whats in your roadmap. from now on, I can do a rebase to your fork and push my changes on yours so can avoid overlap on those matters.

P.S: By roadmap, I mean what do you have in mind other than whats currently on your fork! I already visited yours and crossed what you done from my roadmap.

i have yet to push the branch im working on sorry lol, ill do it in a bit, and i do have a few things in mind i wanna fix, it would be lovely if the discussions tab was open on this repo so i could use the issues tab for this very specific thing instead of it being filled with user questions and not actual issues with the code or bug reports, currently aleph has been MIA for 2 days so im waiting for him to confirm it for me on the other PR i made for the download thing

@maybeknott
Copy link
Copy Markdown
Author

Closing this standalone quota slice because the quota steering work has been folded into #1388, which now carries the relay batching, quota steering, failure quarantine, and script-health UI together as one coherent review unit.

@maybeknott maybeknott closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feature feat: PR — auto-applied by release-drafter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants